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CLAIMS 

What is claimed is: 

11. A method for coordinating the writing of data items to persistent storage, the method 

2 comprising the steps of: 

3 maintaining within a first node a first queue for dirty data items that need to be 

4 written to persistent storage; 

5 maintaining within the first node a second queue for dirty data items that need to be 

6 written to persistent storage; 

f i| 7 moving entries fi*om said first queue to said second queue when the dirty data items 
■pj 8 corresponding to the entries need to be transferred to a node other than said 

9 first node; and 

10 when selecting which data items to write to persistent storage, given priority to data 

1 1 items that correspond to entries in said second queue. 



1 2. The method of Claim 1 wherein the step of moving entries includes moving an entry 

2 fi*om said first queue to said second queue in response to a message received by said 

3 first node, wherein said message indicates that another node has requested the data 

4 item that corresponds to said entry. 

1 3. A method for coordinating the writing of data items to persistent storage, the method 

2 comprising the steps of: 

3 maintaining a forced-write count for each of said data items; 

4 incrementing the forced-write count of a data item whenever the data item is written 

5 to persistent storage by one node for transfer of the data item to another node; 

6 selecting which dirty data items to write to persistent storage based on the write 

7 counts associated with the data items. 
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Thf» method of Claim 3 further comorising the steps of: 
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Storing dirty data items that have forced-write counts above a certain threshold in a 
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particular queue; and 
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when selecting dirty data items to write to persistent storage, giving priority to data 
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items stored in said particular queue. 
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in recnonQp to a Hirtv data item beinff written to nersistent Storage, removing an entry 
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in response to a dirty data item being sent to another node of said multiple-node 
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system without first being written to persistent storage, removing an entry for 
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said data item from said single-failure queue without removing the entry for 
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said data item from said multiple-failure queue. 
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6. 


The method of Claim 5 further comprising the step of sending the dirty data item to 
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another node to allow removal of the entry from said single-failure queue without the 
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other node requesting the dirty data item. 
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7. 


The method of Claim 5 further comprising the steps of: 
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2 after a single node failure, applying said recovery log beginning at a position in said 

3 recovery log associated with the single-failure queue; and 

4 after a multiple node failure, applying said recovery log beginning at a position in 

5 said recovery log associated with the multiple-failure queue. 



The method of Claim 5 wherein: 

said single-failure queue and said multiple-failure queue are implemented by a single 

combined queue; and 
the step of removing an entry for said data item from said single-failure queue 

without removing the entry for said data item from said multiple-failure queue 
includes marking an entry for said data item in said combined queue without 
removing the entry for said data item from said combined queue. 

1 9. The method of Claim 5 wherein said single-failure queue and said multiple-failure 

^ 

2 queue are implemented as two separate queues. 

1 10. A method for recovering after a failure, the method comprising the steps of: 

' ''"^ 2 determining whether the failure involves only one node; and 

3 if the failure involves only said one node, then performing recovery by applying a 

4 recovery log of said node beginning at a first point in the recovery log; and 

5 if the failure involves one or more nodes in addition to said one node, then 

6 performing recovery by applying said recovery log of said node beginning at a 

7 second point in the recovery log; 

8 wherein said first point is different from said second point. 

1 11. The method of Claim 1 0 wherein: 

2 the first point is determined, at least in part, by which data items that were dirtied by 

3 said node reside in caches in other nodes; and 
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4 the second point is determined, at least in part, by which data items that were dirtied 

5 by said node have been persistently stored. 

1 12, A method for recovering after a failure, the method comprising the steps of: 

2 if it is unclear whether a particular version of a data item has been written to disk, 

3 then performing the steps of 

4 without attempting to recover said data item, marking dirtied cached versions 

5 of said data item that would have been covered if said particular 

6 version was written to disk; 

7 when a request is made to write one of said dirtied cached versions to disk, 

8 determining which version of said data item is already on disk; and 

9 if said particular version of said data item is already on disk, then not writing 
1*^ 1 0 said one of said dirtied cached versions to disk. 



1 13. The method of Claim 12 further comprising the step of, if said particular version of 

2 said data item is not already on disk, then recovering said data item. 

1 14. The method of Claim 12 further comprising the step of, if said particular version of 

2 said data item is already on disk, then informing nodes that contain said dirtied 

3 cached versions of the data item that said dirtied cached versions are covered by a 

4 write-to-disk operation. 

1 15. A method for recovering a current version of a data item after a failure in a system 

2 that includes multiple caches, the method comprising the steps of: 

3 modifying the data item in a first node of said multiple caches to create a modified 

4 data item; 
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5 sending the modified data item from said first node to a second node of said multiple 

6 caches without durably storing the modified data item from said first node to 

7 persistent storage; 

8 after said modified data item has been sent from said first node to said second node 

9 and before said data item in said first node has been covered by a write-to-disk 

1 0 operation, discarding said data item in said first node; and 

1 1 after said failure, reconstructing the current version of said data item by applying 

1 2 changes to the data item on persistent storage based on merged redo logs 

1 3 associated with all of said multiple caches. 

O 1 16. The method of Claim 1 5 fiirther comprising the steps of 

fy 2 maintaining, for each of said multiple caches, a globally-dirty checkpoint queue and a 

JS 3 locally-dirty checkpoint queue; 

s 4 wherein the globally-dirty data items associated with entries in the globally-dirty 

hi 5 checkpoint queue are not retained until covered by write-to-disk operations; 

% 6 determining, for each cache, a checkpoint based on a lower of a first-dirtied time of 

7 the entry at the head of the locally-dirty checkpoint queue and the first-dirtied 

8 time of the entry at the head of the globally-dirty checkpoint queue; and 

9 after said failure, determining where to begin processing the redo log associated with 

10 each cache based on the checkpoint determined for said cache. 

1 17. The method of Claim 15 fiirther comprising the steps of: 

2 maintaining, for each of said multiple caches, a globally-dirty checlq)oint queue and a 

3 locally-dirty checkpoint queue; 

4 wherein the globally-dirty data items associated with entries in the globally-dirty 

5 checkpoint queue are not retained until covered by write-to-disk operations; 
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maintaining, for each cache, a first checkpoint record for the locally- dirty checkpoint 
queue that indicates a first time, where all changes made to data items that are 
presently dirty in the cache prior to the first time have been recorded on a 
version of the data item that is on persistent storage; 

maintaining, for each cache, a second checkpoint record for the globally-dirty 

checkpoint queue, wherein the second checkpoint record includes a hst of data 
items that were once dirtied in the cache but have since been transferred out 
and not written to persistent storage; and 

after said failure, determining where to begin processing the redo log associated with 
each cache based on the first checkpoint record and said second checkpoint 
record for said cache. 

18. The method of Claim 17 wherein the step of processing the redo log comprises the 
steps of: 

determining a starting position for scanning the redo log based on a lesser of 

a position in the redo log as determined by the first checkpoint record and 
the positions in the log as determined by the earliest change made to the hst of 
the data items in the second checkpoint record; and 

during recovery, for the portion of the redo log between the position indicated by the 
global checkpoint record to the position indicated by the local checkpoint 
record, considering for potential redo only those log records that correspond to 
the data items identified in the global checkpoint record. 

19. A computer-readable medium carrying instructions for coordinating the writing of 
data items to persistent storage, the instructions comprising instructions for 
performing the steps of: 
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maintaining within a first node a first queue for dirty data items that need to be 

written to persistent storage; 
maintaining within the first node a second queue for dirty data items that need to be 

written to persistent storage; 
moving entries firom said first queue to said second queue when the dirty data items 

corresponding to the entries need to be transferred to a node other than said 

first node; and 

when selecting which data items to write to persistent storage, given priority to data 
items that correspond to entries in said second queue. 

20. The computer-readable medium of Claim 19 wherein the step of moving entries 
includes moving an entry from said first queue to said second queue in response to a 
message received by said first node, wherein said message indicates that another node 
has requested the data item that corresponds to said entry. 

21 . A computer-readable medium carrying instructions for coordinating the writing of 
data items to persistent storage, the instructions comprising instructions for 
performing the steps of: 

maintaining a forced-write count for each of said data items; 

incrementing the forced-write count of a data item whenever the data item is written 

to persistent storage by one node for transfer of the data item to another node; 
selecting which dirty data items to write to persistent storage based on the write 

counts associated with the data items. 



22. The computer-readable medium of Claim 21 fiirther comprising instructions for 
performing the steps of: 

storing dirty data items that have forced-write counts above a certain threshold in a 
particular queue; and 
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when selecting dirty data items to write to persistent storage, giving priority to data 
items stored in said particular queue. 

23. A computer-readable medium carrying instructions for managing information about 
where to begin recovery after a failure, the instructions comprising instructions for 
performing the steps of: 

in a particular node of a multiple-node system, maintaining both 

a single-failure queue that indicates where within a recovery log to begin 

recovery after a failure of said node, and 
a multiple- failure queue that indicates where within said recovery log to begin 
recovery after a failure of said node and one or more other nodes in 
said multiple-node system; 
in response to a dirty data item being written to persistent storage, removing an entry 
for said data item from both said single-failure queue and said multiple-failure 
queue; and 

in response to a dirty data item being sent to another node of said multiple-node 

system without first being written to persistent storage, removing an entry for 
said data item from said single-failure queue without removing the entry for 
said data item from said multiple-failure queue. 

24. The computer-readable medium of Claim 23 fiirther comprising instructions for 
performing the step of sending the dirty data item to another node to allow removal of 
the entry from said single-failure queue without the other node requesting the dirty 
data item. 

25. The computer-readable medium of Claim 23 further comprising instructions for 
performing the steps of: 



50277-1776 (OE) 2001-026-02) 



-54- 



ORACLE CONFIDENTIAL 



3 after a single node failure, applying said recovery log beginning at a position in said 

4 recovery log associated with the single-failure queue; and 

5 after a multiple node failure, applying said recovery log beginning at a position in 

6 said recovery log associated with the multiple-failure queue. 

1 26. The computer-readable medium of Claim 23 wherein: 

2 said single-failure queue and said multiple-failure queue are implemented by a single 

3 combined queue; and 

4 the step of removing an entry for said data item from said single-failure queue 

5 without removing the entry for said data item from said multiple-failure queue 

6 includes marking an entry for said data item in said combined queue without 
M 7 removing the entry for said data item from said combined queue. 



%i 1 27. The computer-readable medium of Claim 23 wherein said single-failure queue and 

n 

IJ!f 2 said multiple-failure queue are implemented as two separate queues. 

3?» 



p;j 1 28. A computer-readable medium carrying instructions for recovering after a failure, the 
' ''^ 2 instructions comprising instructions for performing the steps of: 

3 determining whether the failure involves only one node; and 

4 if the failure involves only said one node, then performing recovery by applying a 

5 recovery log of said node beginning at a first point in the recovery log; and 

6 if the failure involves one or more nodes in addition to said one node, then 

7 performing recovery by applying said recovery log of said node beginning at a 

8 second point in the recovery log; 

9 wherein said first point is different from said second point. 

1 29. The computer-readable medium of Claim 28 wherein: 
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the first point is determined, at least in part, by which data items that were dirtied by 

said node reside in caches in other nodes; and 
the second point is determined, at least in part, by which data items that were dirtied 

by said node have been persistently stored. 

30. A computer-readable medium carrying instructions for recovering after a failure, the 
instructions comprising instructions for performing the steps of: 
if it is imclear whether a particular version of a data item has been written to disk, 
then performing the steps of 

without attempting to recover said data item, marking dirtied cached versions 

of said data item that would have been covered if said particular 

version was written to disk; 
when a request is made to write one of said dirtied cached versions to disk, 

determining which version of said data item is already on disk; and 
if said particular version of said data item is already on disk, then not writing 

said one of said dirtied cached versions to disk. 



3 1 . The computer-readable medium of Claim 30 further comprising instructions for 
performing the step of, if said particular version of said data item is not already on 
disk, then recovering said data item. 

32. The computer-readable medixmi of Claim 30 further comprising instructions for 
performing the step of, if said particular version of said data item is aheady on disk, 
then informing nodes that contain said dirtied cached versions of the data item that 
said dirtied cached versions are covered by a write-to-disk operation. 
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1 33 . A computer-readable medium carrying instructions for recovering a current version of 

2 a data item after a failure in a system that includes multiple caches, the instructions 

3 comprising instructions for performing the steps of: 

4 modifying the data item in a fu-st node of said multiple caches to create a modified 

5 data item; 

6 sending the modified data item from said first node to a second node of said multiple 

7 caches without durably storing the modified data item from said first node to 

8 persistent storage; 

5;;i 9 after said modified data item has been sent from said first node to said second node 

w 10 and before said data item in said first node has been covered by a write-to-disk 

1 1 operation, discarding said data item in said first node; and 

1 2 after said failure, reconstructing the current version of said data item by applying 
n 1 3 changes to the data item on persistent storage based on merged redo logs 
yj 14 associated with all of said multiple caches. 

C? 1 34. The computer-readable medimn of Claim 33 further comprising instructions for 

2 performing the steps of: 

3 maintaining, for each of said multiple caches, a globally-dirty checkpoint queue and a 

4 locally-dirty checkpoint queue; 

5 wherein the globally-dirty data items associated with entries in the globally-dirty 

6 checkpoint queue are not retained until covered by write-to-disk operations; 

7 determining, for each cache, a checkpoint based on a lower of a first-dirtied time of 

8 the entry at the head of the locally-dirty checkpoint queue and the first-dirtied 

9 time of the entry at the head of the globally-dirty checkpoint queue; and 

10 after said failure, determining where to begin processing the redo log associated with 

1 1 each cache based on the checkpoint determined for said cache. 
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1 35. 


The computer-readable medium of Claim 33 further comprising instructions for 
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nerforminff the stens of: 
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maintaining, for each of said multiple caches, a globally-dirty checkpoint queue and a 
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locallv-dirtv checkooint aueue; 






Tj^rUprpin the alnhallv-dirtv data items associated with entries in the eloballv-dirtv 
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checkpoint queue are not retained until covered by write-to-disk operations; 
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maintaining for each cache a first checknoint record for the locallv-dirtv checkpoint 
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queue that indicates a first time, where all changes made to data items that are 
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presently dirty in the cache prior to the first time have been recorded on a 




10 


ver«;ion of the data item that is on nersistent storage i 
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maintaining for each cache a second checkooint record for the globallv-dirtv 
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phprlmnint mieiie wherein the second checknoint record includes a list of data 

^iltv'xlLLr vlXXl ULiV'Vl.Wj WxXwX^XXX UXV OwWv/XXVX vxxwwxvj^v/xxxi. xw/v/xvx xxxwxMxxwhj vtr xxk^v wx xxm>«im> 






item<i that were once dirtied in the cache but have since been transferred out 
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and not written to persistent storage; and 




15 


after said failure, determining where to begin processing the redo log associated with 
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each cache based on the first checkpoint record and said second checkpoint 




17 


record for said cache. 
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The comnuter-readable medium of Claim 35 wherein the sten of nrocessinfi the redo 
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the positions m the log as determined by the earhest change made to the list of 
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the data items in the second checkpoint record; and 
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during recovery, for the portion of the redo log between the position indicated by the 
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global checkpoint record to the position indicated by the local checkpoint 
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record, considering for potential redo only those log records that correspond to 
the data items identified in tfie global checkpoint record. 
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